Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: build basic block pred lists before morph #1309

Merged
merged 11 commits into from
Jan 14, 2020

Conversation

AndyAyersMS
Copy link
Member

Build basic block pred lists before morph, instead of after, and add an early
flow optimization pass. Fix up a few places where ref counts or pred lists
were not properly maintained in morph.

The early flow opt pass enhances the local assertion prop run in morph (by
fusing blocks), allows the jit to avoid morphing some unreachable blocks
(thus saving a bit of TP), and lays the groundwork for more aggressive early
branch folding that would be useful eg #27935).

Build basic block pred lists before morph, instead of after, and add an early
flow optimization pass. Fix up a few places where ref counts or pred lists
were not properly maintained in morph.

The early flow opt pass enhances the local assertion prop run in morph (by
fusing blocks), allows the jit to avoid morphing some unreachable blocks
(thus saving a bit of TP), and lays the groundwork for more aggressive early
branch folding that would be useful eg dotnet#27935).
@AndyAyersMS
Copy link
Member Author

Still need to look at TP impact and drill into some diffs.

Initial jit-diffs looks promising, though.

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -75413 (-0.19% of base)
    diff is an improvement.
Top file regressions (bytes):
         218 : Microsoft.CodeAnalysis.CSharp.dasm (0.01% of base)
         144 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.00% of base)
         135 : System.Collections.Concurrent.dasm (0.04% of base)
          68 : System.Collections.dasm (0.01% of base)
          62 : System.Drawing.Primitives.dasm (0.16% of base)
          53 : System.Memory.dasm (0.02% of base)
          50 : System.Net.Sockets.dasm (0.03% of base)
          44 : System.Net.Security.dasm (0.03% of base)
          38 : System.Diagnostics.Process.dasm (0.04% of base)
          26 : System.Security.Cryptography.Csp.dasm (0.05% of base)
          19 : System.Net.Requests.dasm (0.02% of base)
          19 : System.Runtime.Serialization.Formatters.dasm (0.02% of base)
          16 : System.Diagnostics.DiagnosticSource.dasm (0.06% of base)
          16 : System.Net.NameResolution.dasm (0.08% of base)
          10 : System.Private.Uri.dasm (0.01% of base)
           9 : System.Net.Ping.dasm (0.06% of base)
           9 : System.Security.Cryptography.Primitives.dasm (0.03% of base)
           8 : System.Runtime.Extensions.dasm (0.01% of base)
           7 : System.Text.RegularExpressions.dasm (0.00% of base)
           6 : System.IO.Compression.dasm (0.01% of base)
Top file improvements (bytes):
      -62631 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-2.02% of base)
       -2535 : System.Linq.dasm (-0.26% of base)
       -1364 : System.Data.Common.dasm (-0.09% of base)
       -1078 : System.Threading.Tasks.Dataflow.dasm (-0.13% of base)
       -1030 : Microsoft.CodeAnalysis.dasm (-0.06% of base)
        -583 : System.Diagnostics.TraceSource.dasm (-1.38% of base)
        -568 : xunit.execution.dotnet.dasm (-0.24% of base)
        -488 : NuGet.Protocol.Core.v3.dasm (-0.18% of base)
        -463 : System.Private.CoreLib.dasm (-0.01% of base)
        -407 : System.Private.Xml.dasm (-0.01% of base)
        -368 : NuGet.Versioning.dasm (-1.15% of base)
        -361 : System.Linq.Parallel.dasm (-0.02% of base)
        -341 : Microsoft.DotNet.ProjectModel.dasm (-0.16% of base)
        -313 : Newtonsoft.Json.dasm (-0.04% of base)
        -313 : System.ObjectModel.dasm (-0.32% of base)
        -281 : System.Reflection.Metadata.dasm (-0.07% of base)
        -251 : System.Net.Mail.dasm (-0.12% of base)
        -234 : xunit.performance.metrics.dasm (-1.49% of base)
        -194 : System.Collections.Immutable.dasm (-0.02% of base)
        -173 : Microsoft.CSharp.dasm (-0.06% of base)
83 total files with Code Size differences (58 improved, 25 regressed), 46 unchanged.
Top method regressions (bytes):
        1767 ( 3.87% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
         550 ( 1.72% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrPrivateTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
         200 ( 0.18% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
         147 ( 0.64% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - StateMachineMethodToClassRewriter:VisitTryStatement(BoundTryStatement):BoundNode:this (7 methods)
         135 ( 5.96% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeStreamReaders(byref,ref,byref,byref,byref):this
         125 ( 4.19% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Binder:BindQuery(QueryExpressionSyntax,DiagnosticBag):BoundExpression:this
         111 ( 6.60% of base) : System.Private.CoreLib.dasm - SpanHelpers:LastIndexOfAny(byref,double,double,double,int):int
         109 (11.12% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:BindIntoSelectorLambda(QueryClauseSyntax,ImmutableArray`1,TypeSymbol,bool,HashSet`1,TypeSymbol,ImmutableArray`1,TypeSymbol,SeparatedSyntaxList`1,bool,DiagnosticBag,byref,byref):BoundQueryLambda:this
         108 ( 5.86% of base) : System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,double,double,double,int):int
         105 ( 0.50% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:SelectDistinct(Func`2):ImmutableArray`1:this (49 methods)
          72 ( 1.68% of base) : System.Private.CoreLib.dasm - TextInfo:ChangeCaseCommon(String):String:this (8 methods)
          70 ( 1.24% of base) : System.Private.Xml.dasm - FunctionInfo`1:CastArguments(IList`1,String,XPathQilFactory):this (7 methods)
          59 ( 1.35% of base) : System.Linq.Parallel.dasm - OrderPreservingPipeliningSpoolingTask`2:SpoolingWork():this (7 methods)
          57 ( 3.71% of base) : System.Memory.dasm - BuffersExtensions:WriteMultiSegment(IBufferWriter`1,byref,Span`1) (7 methods)
          56 ( 1.48% of base) : System.Private.DataContractSerialization.dasm - ClassDataContractCriticalHelper:ImportDataMembers():this
          55 ( 2.68% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SyntaxNormalizer:NeedsSeparator(SyntaxToken,SyntaxToken):bool:this
          52 ( 4.31% of base) : System.Private.CoreLib.dasm - SpanHelpers:IndexOfAny(byref,double,double,int):int
          52 ( 4.29% of base) : System.Private.CoreLib.dasm - SpanHelpers:LastIndexOfAny(byref,double,double,int):int
          49 ( 1.20% of base) : Microsoft.CodeAnalysis.CSharp.dasm - LocalRewriter:RewriteEnumeratorForEachStatement(BoundForEachStatement):BoundStatement:this
          47 ( 0.46% of base) : System.Private.CoreLib.dasm - DefaultBinder:BindToMethod(int,ref,byref,ref,CultureInfo,ref,byref):MethodBase:this
Top method improvements (bytes):
       -2348 (-13.34% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - AspNetTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
        -792 (-13.04% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrRundownTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
        -503 (-1.79% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
        -486 (-14.03% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - SymbolTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
        -432 (-12.40% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - HeapTraceProviderTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
        -414 (-3.16% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeTableReaders(MemoryBlock,ubyte,ref,ref):this
        -380 (-7.55% of base) : System.Threading.Tasks.Dataflow.dasm - ObserversState:NotifyObserversOfCompletion(Exception):this (7 methods)
        -346 (-2.73% of base) : System.Linq.dasm - Enumerable:Sum(IEnumerable`1,Func`2):Nullable`1 (35 methods)
        -324 (-12.18% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ThreadPoolTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
        -268 (-15.71% of base) : System.Private.CoreLib.dasm - TimeZoneInfo:GetIsDaylightSavings(DateTime,AdjustmentRule,DaylightTimeStruct):bool
        -252 (-12.17% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - JSDumpHeapTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
        -235 (-4.17% of base) : Microsoft.CodeAnalysis.dasm - <TransposeInternal>d__3`1:MoveNext():bool:this (7 methods)
        -217 (-2.76% of base) : System.Private.CoreLib.dasm - ReadOnlyDictionary`2:System.Collections.ICollection.CopyTo(Array,int):this (7 methods)
        -210 (-3.57% of base) : System.Private.CoreLib.dasm - ReadOnlyDictionaryHelpers:CopyToNonGenericICollectionHelper(ICollection`1,Array,int) (7 methods)
        -208 (-9.15% of base) : System.Collections.Immutable.dasm - HashBucket:Add(Vector`1,long,IEqualityComparer`1,IEqualityComparer`1,int,byref):HashBucket:this
        -198 (-2.13% of base) : System.Data.Common.dasm - SqlConvert:ChangeTypeForXML(Object,Type):Object
        -190 (-13.53% of base) : Microsoft.CodeAnalysis.dasm - PdbWriter:SetAsyncInfo(int,int,int,ImmutableArray`1,ImmutableArray`1):this
        -186 (-1.90% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerExecutor:ExecuteCodeBlockActionsCore(IEnumerable`1,IEnumerable`1,IEnumerable`1,DiagnosticAnalyzer,SyntaxNode,ISymbol,ImmutableArray`1,SemanticModel,Func`2,CodeBlockAnalyzerStateData):this (6 methods)
        -186 (-12.41% of base) : System.Linq.dasm - Enumerable:LongCount(IEnumerable`1):long (7 methods)
        -186 (-8.90% of base) : System.Linq.dasm - WhereSelectEnumerableIterator`2:GetCount(bool):int:this (7 methods)
Top method regressions (percentages):
          42 (14.05% of base) : System.Collections.Concurrent.dasm - ConcurrentDictionary`2:ThrowIfInvalidObjectValue(Object) (7 methods)
           3 (12.00% of base) : System.Private.CoreLib.dasm - Rune:CompareTo(Rune):int:this
         109 (11.12% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:BindIntoSelectorLambda(QueryClauseSyntax,ImmutableArray`1,TypeSymbol,bool,HashSet`1,TypeSymbol,ImmutableArray`1,TypeSymbol,SeparatedSyntaxList`1,bool,DiagnosticBag,byref,byref):BoundQueryLambda:this
          12 (10.81% of base) : System.Drawing.Primitives.dasm - Rectangle:Union(Rectangle,Rectangle):Rectangle
          27 (10.38% of base) : Microsoft.CodeAnalysis.dasm - PdbLogger:GetLogHash():ref:this
          10 ( 9.62% of base) : System.Drawing.Primitives.dasm - Color:GetBrightness():float:this
          20 ( 8.77% of base) : System.Drawing.Primitives.dasm - Color:GetHue():float:this
          14 ( 8.14% of base) : System.Collections.dasm - SortedList`2:get_Item(int):long:this
          14 ( 8.14% of base) : System.Private.CoreLib.dasm - ASCIIEncoding:GetByteCount(long,int):int:this
          14 ( 8.00% of base) : System.Collections.dasm - SortedList`2:get_Item(long):long:this
          14 ( 8.00% of base) : System.Net.Http.dasm - HPackEncoder:EncodeStringLiteralValue(String,Span`1,byref):bool
           8 ( 7.92% of base) : Microsoft.CodeAnalysis.CSharp.dasm - LocalRewriter:AddSequencePoint(BoundStatement):BoundStatement:this
          10 ( 7.75% of base) : System.Private.CoreLib.dasm - ASCIIEncoding:GetCharCount(ReadOnlySpan`1):int:this
           8 ( 7.62% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BoundStatementList:Synthesized(CSharpSyntaxNode,ImmutableArray`1):BoundStatementList
           3 ( 7.50% of base) : System.Threading.Tasks.Dataflow.dasm - TargetObserver`1:System.IObserver<TInput>.OnNext(__Canon):this
           9 ( 7.38% of base) : System.Private.CoreLib.dasm - FixedWSTRMarshaler:ConvertToNative(String,long,int)
           3 ( 7.14% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SyntheticBoundNodeFactory:HiddenSequencePoint():BoundStatement:this
           3 ( 7.14% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SyntheticBoundNodeFactory:HiddenSequencePoint():BoundStatement:this
          13 ( 6.99% of base) : System.Drawing.Primitives.dasm - Rectangle:Intersect(Rectangle,Rectangle):Rectangle
           5 ( 6.85% of base) : System.Reflection.Metadata.dasm - BlobReader:ReadBytes(int):ref:this
Top method improvements (percentages):
          -4 (-80.00% of base) : CommandLine.dasm - ParserSettings:Finalize():this
         -36 (-50.00% of base) : System.Private.CoreLib.dasm - DefaultDecoder:GetChars(ref,int,int,ref,int):int:this (2 base, 1 diff methods)
        -129 (-38.86% of base) : NuGet.Versioning.dasm - VersionRangeComparer:GetHashCode(VersionRangeBase):int:this
        -103 (-37.05% of base) : Microsoft.CodeAnalysis.dasm - DebugSourceDocument:.ctor(String,Guid,ImmutableArray`1,Guid):this
         -55 (-35.48% of base) : NuGet.Versioning.dasm - FloatRange:GetHashCode():int:this
         -78 (-34.98% of base) : NuGet.RuntimeModel.dasm - RuntimeDescription:GetHashCode():int:this
         -50 (-34.72% of base) : NuGet.DependencyResolver.Core.dasm - LibraryRangeCacheKey:GetHashCode():int:this
        -182 (-31.27% of base) : NuGet.Versioning.dasm - VersionComparer:GetHashCode(SemanticVersion):int:this
         -50 (-31.25% of base) : NuGet.RuntimeModel.dasm - RuntimeDependencySet:GetHashCode():int:this
         -50 (-31.06% of base) : NuGet.Packaging.Core.dasm - PackageType:GetHashCode():int:this
         -54 (-27.27% of base) : NuGet.Packaging.Core.Types.dasm - PackageIdentityComparer:GetHashCode(PackageIdentity):int:this
         -11 (-25.58% of base) : System.Private.CoreLib.dasm - MemoryExtensions:BinarySearch(Span`1,ubyte):int
         -11 (-25.58% of base) : System.Private.CoreLib.dasm - MemoryExtensions:BinarySearch(Span`1,short):int
        -138 (-19.71% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:add_MemoryProcessMemInfo(Action`1):this
         -60 (-19.42% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:add_MessageLogEventSizeExceeded(Action`1):this
         -60 (-19.42% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:add_ServiceActivationException(Action`1):this
         -60 (-19.42% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:add_ReceiveContextCompleteFailed(Action`1):this
         -60 (-19.42% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:add_ReceiveContextAbandonFailed(Action`1):this
         -60 (-19.42% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:add_ReceiveContextAbandonWithException(Action`1):this
         -60 (-19.42% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:add_QueryCompositionExecuted(Action`1):this
4165 total methods with Code Size differences (3109 improved, 1056 regressed), 204766 unchanged.

@mikedn
Copy link
Contributor

mikedn commented Jan 5, 2020

Cool, I was wondering recently if predecessors can't be computed earlier. With predecessors available perhaps we can make fgMarkAddressExposedLocals a bit more powerful.

What's the cause of the huge "tracing" assembly diff? Struct copies eliminated by assertion prop I guess...

@jkotas jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 6, 2020
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jan 7, 2020

Preliminary look at diffs (just looking at asm, so guesswork here, will take a while to root cause these, especially in the larger methods):

Good cases

  • Struct copy prop: ClrRundownTraceEventParser:EnumerateTemplates(Func3,Action1):this
  • Stopped do -> while: Enumerable:LongCount(IEnumerable1):long

Bad cases

  • Overly aggressive copy prop: SpanHelpers:IndexOfAny(byref,double,double,int):int and similar
  • Duplicated throw block: ConcurrentDictionary2:ThrowIfInvalidObjectValue(Object)
  • Flow opts duplicates code: Rectangle:Union(Rectangle,Rectangle):Rectangle

@AndyAyersMS
Copy link
Member Author

Looking deeper at some of the code size diffs.

Improvements

Enumerable:LongCount(IEnumerable1):long

Do-while is now inhibited because early flow opts fuses an empty block
at the end of a try into its predecessor, so when morph goes to place
a throw helper block it ends up putting it early in the try region
rather than at the end, and this breaks the simple-minded do-while
pattern.

Seems like (for the most part) we'd prefer that these throw helpers
always live at the end of the try region and not start up somewhere
earlier, even if we need to branch around them. We eventually move it
to the end anyways.

Blocking do-while turns out to be beneficial for code size as the loop
exit test code has a (non loop invariant) call to MoveNext which
can't be further optimized or lead to other optimizations.

VersionRangeComparer:GetHashCode(VersionRangeBase):int:this

Early constant prop and branch folding simplifies a join into linear
flow, suppressing phi insertion and unblocking a bunch of later
optimizations.

Likely could also get this with conditionally-aware optimization passes.

MemoryExtensions:BinarySearch(Span1,ubyte):int`

Early copy prop helps avoid some argument shuffling leading up to a call.

TimeZoneInfo:GetIsDaylightSavings(DateTime,AdjustmentRule,DaylightTimeStruct):bool

Inlined DateTime constructors have checks if seconds == 60, which are
all false here; early pruning of this blows away a later join which
inhibits profitable constant prop.

ParserSettings:Finalize():this

Looks like we're discarding a side effect tree under a jump.

Removing conditional jump to next block (BB01 -> BB06)

Removing statement STMT00004 (IL 0x000...  ???)
               [000011] ---XG-------              *  JTRUE     void  
               [000010] ---XG-------              \--*  NE        int   
               [000008] ---XG-------                 +--*  FIELD     bool   disposed
               [000000] ------------                 |  \--*  LCL_VAR   ref    V00 this         
               [000009] ------------                 \--*  CNS_INT   int    0
 in BB01 as useless:

This should be finding side effects to extract. Fix is to teach
OperMayThrow about GT_FIELD.

AspNetTraceEventParser:EnumerateTemplates...

Struct copy prop kicks in pretty heavily... likewise for EnumerateTemplate cases.

Regressions

KernelTraceEventParser:EnumerateTemplates

So why do some EnumerateTemplate cases regress in size? For larger
ones, we run into the tracked local threshold. Early struct copy prop
has boosted ref count for some structs, which makes them tracked and
steals tracking slots for some scalars.

The directly observble effect is extra spills, eg

IN11c8: 006879 mov      rcx, 0xD1FFAB1E
IN11c9: 006883 mov      rcx, gword ptr [rcx]
IN11ca: 006886 mov      gword ptr [V692 rbp-11B8H], rcx     // spill

IN11cb: 00688D mov      rcx, 0xD1FFAB1E
IN11cc: 006897 mov      rcx, gword ptr [rcx]
IN11cd: 00689A vmovdqu  xmm0, xmmword ptr [rcx+8]           // copy
IN11ce: 00689F vmovdqu  xmmword ptr [V513 rbp-1010H], xmm0

IN11cf: 0068A7 mov      rcx, 0xD1FFAB1E
IN11d0: 0068B1 mov      rcx, gword ptr [rcx]
IN11d1: 0068B4 mov      gword ptr [V689 rbp-11A0H], rcx     // spill
IN11d2: 0068BB xor      rcx, rcx
IN11d3: 0068BD mov      gword ptr [V07+0x50 rsp+50H], rcx
IN11d4: 0068C2 mov      rcx, gword ptr [V689 rbp-11A0H]     // reload
IN11d5: 0068C9 mov      gword ptr [V07+0x48 rsp+48H], rcx
IN11d6: 0068CE lea      rcx, bword ptr [V513 rbp-1010H]
IN11d7: 0068D5 mov      bword ptr [V07+0x40 rsp+40H], rcx
IN11d8: 0068DA lea      rcx, bword ptr [V512 rbp-1000H]
IN11d9: 0068E1 mov      bword ptr [V07+0x28 rsp+28H], rcx
IN11da: 0068E6 mov      rcx, gword ptr [V690 rbp-11A8H]
IN11db: 0068ED mov      rdx, gword ptr [V691 rbp-11B0H]
IN11dc: 0068F4 mov      gword ptr [V07+0x20 rsp+20H], rdx
IN11dd: 0068F9 mov      rdx, gword ptr [V692 rbp-11B8H]     // reload
IN11de: 006900 mov      gword ptr [V07+0x38 rsp+38H], rdx
IN11df: 006905 xor      rdx, rdx
IN11e0: 006907 mov      r8d, 0xFFFF
IN11e1: 00690D mov      r9d, 10
IN11e2: 006913 mov      dword ptr [V07+0x30 rsp+30H], 15
IN11e3: 00691B call     MemoryPageFaultTraceData:.ctor(...)

There are no spills in the base version.

Color:GetBrightness():float:this

Early copy prop (in morph) is undone by later copy prop (in optCopyProp):

;; early
               [000065] ------------                 +--*  LCL_VAR   int    V09 tmp3         
               [000066] ------------                 \--*  LCL_VAR   int    V10 tmp4         

Assertion prop in BB01:
Copy     Assertion: V09 == V01 index=#02, mask=0000000000000002
               [000065] ------------              *  LCL_VAR   int    V01 loc0   

;; late

VN based copy assertion for [000065] V01 @000001C2 by [000077] V09 @000001C2.
N001 (  1,  1) [000065] ------------              *  LCL_VAR   int    V01 loc0         u:1 $1c2
copy propagated to:
N001 (  1,  1) [000065] ------------              *  LCL_VAR   int    V09 tmp3         u:1 $1c2

leading to worse codegen. Base jit just copy props once, late.

optCopyProp seems to prefer always making a copy; one might imagine it should instead
consult ref counts, if available, and prefer the more heavily or frequently weighted local.

Rectangle:Union(Rectangle,Rectangle):Rectangle

Similar to above, an early copy prop is undone later.

Binder:BindQuery(QueryExpressionSyntax,DiagnosticBag):BoundExpression:this

Some undone copy props; one extra CSE. Allocator spills more, includes some odd sequences like

IN021b: 000A4E mov      rax, gword ptr [V101 rsp+68H]
IN021c: 000A53 mov      byte  ptr [rax+17], 1
IN021d: 000A57 mov      gword ptr [V101 rsp+68H], rax

@AndyAyersMS
Copy link
Member Author

Still have an arm assert to chase down -- can't get it to repro with altjits.

@AndyAyersMS AndyAyersMS marked this pull request as ready for review January 9, 2020 00:26
@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib PTAL

@BruceForstall BruceForstall self-requested a review January 9, 2020 01:40
Copy link
Contributor

@jashook jashook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm cool change!

@AndyAyersMS
Copy link
Member Author

Throughput (pin icount on release crossgen spc, median of 5 runs each) is basically a wash....

base = 19126398630
diff = 19114462152
diff/base = 0.999

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice change! Just a few little things

// the first block is not scratch and is targeted by a
// branch.
assert(fgFirstBB->bbRefs >= 1);
fgFirstBB->bbRefs--;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're here, it works be nice to update the header, including when and under what pre-conditions it will be called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

// Don't build address modes out of non-foldable constants
if (!op2->AsIntConCommon()->ImmedValCanBeFolded(compiler, addr->OperGet()))
{
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is a necessary fix, otherwise we fold handle + const into an LEA on x86, and lose track of a reloc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; so it was an existing issue that was exposed by this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of ... looking at this again perhaps I should fix it differently.

VN-based constant prop won't propagate handles for 32 bit targets when opts.compReloc is true. But assertion-based constant prop doesn't have this restriction. And so with this change, the early flow opts enable an assertion-based constant prop of a handle, and on x86 constant handle is fodder for LEA formation.

So maybe I should update optConstantAssertionProp with similar restrictions as vn based const prop and revert the codegen change (or turn it into an assert).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will split this part of as a separate change, which should be zero-diff and will allow bit more refactoring than I'd do if I tried folding it in here. Should have that PR up soon.


if (fldObj != nullptr)
{
return comp->fgAddrCouldBeNull(fldObj);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was presumably needed because this is now called before morph has removed these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


// Run an early flow graph simplification pass
if (opts.OptimizationEnabled())
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this was just cut-and-pasted, but it would be nice to update the comment style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

FWIW there are a lot of old-style comments throughout fgMorph. And I'm tempted to "inline" fgMorph into compCompile so that most of the phases in the jit are captured in one method. Maybe I'll do this bigger change as a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Jan 9, 2020
…relocs

This gives local constant prop the same behavior as value-number-based constant
prop; both now block handle propagation when the jit must generate relocs (R2R),
to ensure that handle references appear only in simple instruction forms like movs.

Prerequisite to dotnet#1309, which can enable such propagation.
AndyAyersMS added a commit that referenced this pull request Jan 10, 2020
…relocs (#1593)

This gives local constant prop the same behavior as value-number-based constant
prop; both now block handle propagation when the jit must generate relocs (R2R),
to ensure that handle references appear only in simple instruction forms like movs.

Prerequisite to #1309, which can enable such propagation.
@AndyAyersMS
Copy link
Member Author

Still chasing down one last issue related to tail call validation that came up in offline testing.

@BruceForstall
Copy link
Member

You should probably trigger outerloop jitstress jobs when doing final testing. I'm not sure what shape those jobs are in, though: as far as I can tell, "runtime-coreclr jitstress" has never actually succeeded: https://dev.azure.com/dnceng/public/_build?definitionId=658&_a=summary. You might want to try a Pri-1 normal / JitStress=2 run locally, to be safe.

@AndyAyersMS
Copy link
Member Author

Are local tests broken?

I can't get any incantation of src\coreclr\tests\runtest.cmd working; it is looking for "dotnet.cmd" and other stuff that now seems to be gone.

@BruceForstall
Copy link
Member

@AndyAyersMS Are you up-to-date? Works for me. Note that dotnet.cmd has moved to the repo root.

@AndyAyersMS
Copy link
Member Author

Not in that repo, no. I would prefer not to rebase just yet. runtest.py seems to work ok.

@AndyAyersMS
Copy link
Member Author

Ran jitstress=2 locally on the pri-1 tests, and there were 39 failures. I don't know the baseline number though the rolling runs seem to have maybe 30 failures. At least one failure mode looks related so I'll investigate further:

Assert failure(PID 162308 [0x00027a04], Thread: 169500 [0x2961c]): Assertion failed 'topBB->bbNum <= botBB->bbNum' in 'JitTest.TestClass:Method3(JitTest.ValueClass):System.String' (IL size 39)

    File: E:\repos\runtime0\src\coreclr\src\jit\optimizer.cpp Line: 3931
    Image: E:\repos\runtime0\artifacts\tests\coreclr\Windows_NT.x64.Checked\Tests\Core_Root\CoreRun.exe

@AndyAyersMS
Copy link
Member Author

Building tests via build.cmd rather than via build-test.cmd seems to go better.

I believe that last commit fixes all the unique JitStress2 pri1 failures, though on rerun I got some new failures that appear to be unrelated (missing mscorrc). So re-rerunning....

Baseline run has 28 failing tests. Probably worth digging in and trying to get those cleaned up.

@AndyAyersMS
Copy link
Member Author

Re-run of this change also has 28 failures...

@AndyAyersMS
Copy link
Member Author

@BruceForstall any other suggestions for testing?

@BruceForstall
Copy link
Member

Normally, I'd suggest kicking off the AzDO jitstress pipeline so you get all-platform coverage. I don't know what the state of these are currently.

@AndyAyersMS
Copy link
Member Author

Looks like there are ~950 or so baseline failures....

/azp run runtime-coreclr jitstress

@AndyAyersMS
Copy link
Member Author

Got a partial jitstress run through. Doesn't look like there are any unique failures, though it is hard to be 100% sure.

@AndyAyersMS
Copy link
Member Author

@BruceForstall waiting on you to sign off or suggest other tests to run.

@BruceForstall
Copy link
Member

Will look soon

return P1(o);
}

// F3 needs to be too big to inline without being marked noinline,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those tests that seems like it could be easily not testing what we want it to test with a few changes/improvements to JIT capabilities and heuristics. And we wouldn't know (or be notified about) when that happens. I wonder if there would be some way to specify for a test/function "I expect this to tail call", or "I expect this to be inlined", for example, and if it doesn't happen, the test fails (e.g., JIT assert).

@@ -2729,13 +2729,31 @@ inline unsigned Compiler::fgThrowHlpBlkStkLevel(BasicBlock* block)
*/
inline void Compiler::fgConvertBBToThrowBB(BasicBlock* block)
{
// If we're converting a BBJ_CALLFINALLY block to a BBJ_THROW block,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment not related to your change: the size of some of these "inline" functions has gotten ridiculous. Seems like we should get rid of the ".hpp" file entirely and assume a good compiler will inline appropriately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, was going to move this one in particular, but perhaps they all should go.

Perhaps part of some broader breakup of the bigger files into related sets of functionality or something.

//
// We maintain the invariant that a scratch BB ends with BBJ_NONE or
// BBJ_ALWAYS, so that when adding independent bits of initialization,
// callers can generally append to the fgFirstBB block without worring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: worring => worrying

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see these before merging; will fix the typos in some future change.

@@ -453,6 +461,15 @@ void Compiler::fgEnsureFirstBBisScratch()
block->inheritWeight(fgFirstBB);
}

// The first block has an implicit ref count which we must
// remove. Note the ref count could be greater that one, if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar-o: that => than

void Compiler::fgEnsureFirstBBisScratch()
{
// This method does not update predecessor lists and so must only be called before they are computed.
assert(!fgComputePredsDone);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function now be called both before and after preds are computed? Or only after? Doesn't preds computation set bbRefs, and you assert on that, so it has to be after? In which case, could you assert(fgComputePredsDone)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be called both before and after.

bbRefs gets set early on, before preds, and is mostly right, for most blocks.

@AndyAyersMS AndyAyersMS merged commit 7a6a83f into dotnet:master Jan 14, 2020
@AndyAyersMS AndyAyersMS deleted the BuildPredListsBeforeMorph branch January 14, 2020 19:25
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Jan 16, 2020
Assertion prop uses `O2K_CONST_INT` for handles, even on 64 bit targets. Make
suitable adjustments to the handle propagation blocking logic.

Problem was introduced in dotnet#1593 and exposed by dotnet#1309.

Fixes dotnet#1777
AndyAyersMS added a commit that referenced this pull request Jan 16, 2020
Assertion prop uses `O2K_CONST_INT` for handles, even on 64 bit targets. Make
suitable adjustments to the handle propagation blocking logic.

Problem was introduced in #1593 and exposed by #1309.

Fixes #1777
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Jan 23, 2020
This gives us a single method that controls most of the jit's phase behavior.
Largely a manual inline, though I updated some comments and changed one assert.

Follow up from dotnet#1309; also see dotnet#2109.
AndyAyersMS added a commit that referenced this pull request Jan 25, 2020
This gives us a single method that controls most of the jit's phase behavior.
Largely a manual inline, though I updated some comments and changed one assert.

Follow up from #1309; also see #2109.
@AndyAyersMS AndyAyersMS added optimization tenet-performance Performance related issue labels Mar 7, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants